dataframe Basic Check function -
1: info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20000 entries, 143572 to 138606
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 SaleID 20000 non-null int64
1 regDate 20000 non-null int64
2 model 20000 non-null int64
3 brand 20000 non-null int64
4 bodyType 20000 non-null int64
5 fuelType 20000 non-null float64
6 gearbox 20000 non-null object
7 power 20000 non-null object
8 kilometer 20000 non-null object
9 notRepairedDamage 20000 non-null object
10 regionCode 20000 non-null int64
11 seller 20000 non-null int64
12 offerType 20000 non-null float64
13 creatDate 20000 non-null float64
14 price 20000 non-null float64
dtypes: float64(4), int64(7), object(4)
memory usage: 2.4+ MB
None
2: describe():
SaleID regDate model brand bodyType \
count 20000.000000 2.000000e+04 20000.000000 20000.000000 20000.000000
mean 75513.315150 2.003381e+07 47.727300 8.094150 1.846000
std 43180.721509 5.365699e+04 49.676568 7.848335 3.763083
min 3.000000 1.991000e+07 0.000000 0.000000 0.000000
25% 38595.750000 1.999091e+07 11.000000 1.000000 0.000000
50% 75645.000000 2.003090e+07 30.000000 6.000000 1.000000
75% 113090.750000 2.007110e+07 66.000000 13.000000 3.000000
max 149996.000000 2.015121e+07 247.000000 39.000000 156.000000
fuelType regionCode seller offerType creatDate \
count 20000.000000 2.000000e+04 2.000000e+04 2.000000e+04 2.000000e+04
mean 1.390075 2.150097e+05 2.812609e+05 1.457643e+06 1.820903e+07
std 13.485880 2.059593e+06 2.364529e+06 5.221320e+06 5.960597e+06
min 0.000000 0.000000e+00 0.000000e+00 0.000000e+00 -4.169894e+00
25% 0.000000 7.200000e+02 0.000000e+00 0.000000e+00 2.016031e+07
50% 0.000000 1.992000e+03 0.000000e+00 0.000000e+00 2.016032e+07
75% 1.000000 3.672000e+03 0.000000e+00 0.000000e+00 2.016033e+07
max 1103.000000 2.016041e+07 2.016040e+07 2.016041e+07 2.016041e+07
price
count 20000.000000
mean 5643.327486
std 7507.319472
min -3.902379
25% 999.000000
50% 2950.000000
75% 7490.000000
max 99900.000000
3: head() --
SaleID regDate model brand bodyType fuelType gearbox power \
143572 143572 20030007 180 13 3 0.0 128 15
82758 82758 20080207 64 21 1 2.0 0 67
3479 3479 20040611 13 4 0 1.0 1 218
89329 89329 20070606 26 14 4 0.0 0 140
90675 90675 20020112 88 14 1 0.0 0 74
kilometer notRepairedDamage regionCode seller offerType \
143572 1 6550 0 0 20160312.0
82758 15 0 1696 0 0.0
3479 15 0 1844 0 0.0
89329 9 0 6864 0 0.0
90675 15 0 3545 0 0.0
creatDate price
143572 1000.0 43.102277
82758 20160312.0 1800.000000
3479 20160328.0 6700.000000
89329 20160403.0 6500.000000
90675 20160404.0 1990.000000
4: shape:
(20000, 15)
Value Counts for each feature -
5: SaleID value_counts():
67583 1
45619 1
78823 1
79767 1
121390 1
..
11583 1
54592 1
89409 1
35309 1
65536 1
Name: SaleID, Length: 20000, dtype: int64
6: regDate value_counts():
20000001 29
20000008 27
20000002 26
20000010 24
20000004 24
..
20130911 1
19920201 1
19941011 1
19920507 1
20070205 1
Name: regDate, Length: 3558, dtype: int64
7: model value_counts():
0 1568
19 1250
4 1104
1 804
29 690
...
216 1
235 1
236 1
237 1
231 1
Name: model, Length: 242, dtype: int64
8: brand value_counts():
0 4172
4 2227
14 2202
10 1922
1 1820
6 1357
9 965
5 586
13 498
11 384
3 336
16 304
7 298
8 293
25 283
27 272
21 193
19 187
15 187
20 174
22 163
12 156
26 128
17 126
30 125
24 111
28 95
32 83
29 52
2 49
31 42
18 38
37 38
34 30
33 30
36 26
23 22
35 19
38 6
39 1
Name: brand, dtype: int64
9: bodyType value_counts():
0 6076
1 4776
2 4021
3 1757
4 1228
5 1067
6 884
7 169
60 4
150 3
55 1
71 1
156 1
56 1
101 1
136 1
100 1
68 1
89 1
90 1
75 1
65 1
140 1
80 1
125 1
Name: bodyType, dtype: int64
10: fuelType value_counts():
0.0 12928
1.0 6356
2.0 296
15.0 117
0.5 44
...
131.0 1
68.0 1
155.0 1
65.0 1
87.0 1
Name: fuelType, Length: 79, dtype: int64
11: gearbox value_counts():
0 14306
1 4242
- 197
15 197
75 90
...
439 1
367 1
245 1
333 1
286 1
Name: gearbox, Length: 158, dtype: int64
12: power value_counts():
75 1173
0 1114
15 1067
150 822
140 764
...
2623 1
3062 1
466 1
2154 1
950 1
Name: power, Length: 511, dtype: int64
13: kilometer value_counts():
15 11468
12.5 1920
0 904
10 777
9 697
...
6378 1
4329 1
480 1
2013 1
538 1
Name: kilometer, Length: 280, dtype: int64
14: notRepairedDamage value_counts():
0 14454
- 2455
1 1645
72 9
486 7
...
4046 1
991 1
5936 1
4444 1
6518 1
Name: notRepairedDamage, Length: 1170, dtype: int64
15: regionCode value_counts():
0 1729
419 45
764 32
125 25
450 22
...
2150 1
2134 1
3887 1
4149 1
5925 1
Name: regionCode, Length: 5482, dtype: int64
16: seller value_counts():
0 19510
20160310 17
20160331 16
20160401 14
20160314 12
...
5800 1
2100 1
75 1
699 1
1999 1
Name: seller, Length: 134, dtype: int64
17: offerType value_counts():
0.000000e+00 18064
2.016031e+07 66
2.016032e+07 63
2.016031e+07 59
2.016032e+07 59
...
9.700000e+03 1
5.600000e+03 1
7.990000e+03 1
3.545805e+01 1
6.399000e+03 1
Name: offerType, Length: 380, dtype: int64
18: creatDate value_counts():
2.016040e+07 724
2.016031e+07 683
2.016040e+07 659
2.016033e+07 658
2.016032e+07 646
...
9.490000e+02 1
-3.460437e+00 1
-3.874060e+00 1
4.667299e+01 1
1.753440e+00 1
Name: creatDate, Length: 893, dtype: int64
19: price value_counts():
500.000000 284
1500.000000 249
2500.000000 230
1000.000000 205
3500.000000 203
...
1720.000000 1
755.000000 1
9870.000000 1
10150.000000 1
-3.239698 1
Name: price, Length: 3489, dtype: int64